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Abstract: Let y be a Gaussian vector of R" of mean s and diagonal co- 
variance matrix F. Our aim is to estimate both s and the entries ai = Ti^i, 
for i = 1, . . . ,n, on the basis of the observation of two independent copies 
of Y . Our approach is free of any prior assumption on s but requires that 
r^ . we know some upper bound 7 on the ratio maxi ai/ mini <^i- For example, 

the choice 7 = 1 corresponds to the homoscedastic case where the com- 
ponents of Y are assumed to have common (unknown) variance. In the 
opposite, the choice 7 > 1 corresponds to the heteroscedastic case where 
the variances of the components of Y are allowed to vary within some range. 
Our estimation strategy is based on model selection. We consider a fam- 
^N| ' ily {Sm X Sm, m £ M} of parameter sets where Sm and Em are linear 

^ ' spaces. To each m g M, we associate a pair of estimators (sm, crm) of (s, a) 

t^"^ ' with values in Sm X "^m- Then we design a model selection procedure in 

^^ view of selecting some rh among M. in such a way that the KuUback risk of 

|/~v , (srfii ^m) is as close as possible to the minimum of the KuUback risks among 

»vj ' the family of estimators {(sm, <5"m), m € A^}. Then we derive uniform rates 

• ' of convergence for the estimator (s^, a^) over Holderian balls. Finally, we 

^*~- ' carry out a simulation study in order to illustrate the performances of our 

CD ' estimators in practice. 
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1. Introduction 

Let us consider the statistical framework given by the distribution of a Gaussian 
vector Y with mean s = (si, . . . , s„)' G K" and diagonal covariance matrix 

/ai ••• 0\ 
■■. ■■. : 



r„ = 



: •. •. 
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where a = (cri, . . . , Un)' S (0, oo)". The vectors s and a are both assumed to 
be unknown. Hereafter, for any t = (ti, . . . ,t,i)' G R" and r = (ti, . . . ,t„)' G 
(0,00)", we denote by Pt^T the distribution of a Gaussian vector with mean t 
and covariance matrix F,- and by lC{Ps^ai Pt,T) the Kullback-Leibler divergence 
between Pg a- and Pt r , 






V-^z ^ij 1 1 I ^1 



2 -^ T, \ (T, 

i— 1 ^ 

where 4>{u) = logu + 1/u — 1, for u > 0. Note that, if the fXi's are known and 
constant, the Kullback-Leibler divergence becomes the squared L^-norni and, 
in expectation, corresponds to the quadratic risk. 

Let us suppose that we observe two independent copies of Y, namely Y'-^' = 
(yI^\. . . , ri^')' and r[2] = {yI'^\. . . , Yi^^y. Their coordinates can be expanded 
as 

X/'' = s, + ^,ef , i = 1, . . . , n and J = 1, 2 , (1.1) 

where gl^l = (e!^ , . . . ,e„ )' and e'^l = (e!^ , . . . ,eii )' are two independent stan- 
dard Gaussian vectors. We are interested here in the estimation of the two 
vectors s and a. Indeed, their behaviors contain substantial knowledge about 
the phenomenon represented by the distribution of Y . We have particularly in 
mind the case of a variance that stays approximately constant by periods and 
that can take several values in the proceeding of the observations. Of course, we 
want to estimate the mean s but, in this particular case, we are also interested 
in recovering the periods of constancy and the values taken by the variance a. 
The Kullback-Leibler divergence measures the differences between two distribu- 
tions Ps^„ and Pt^T- Thus, it allows us to deal with the two estimation problems 
at the same time. More generally, the aim of this paper is to estimate the pair 
(s, a) by model selection on the basis of the observation of Y^^^ and Y^^. 

For this, we introduce a collection T = {Sra x 5],„, to G A^} of products of 
linear subspaces of M" indexed by a finite or countable set Al . In the sequel, 
these products will be called models and, for any m G M, we will denote by Dm 
the dimension of Sm x Em • To each to G A^ , we will associate a pair of estimators 
(sm,<5'm) that is similar to the maximum likelihood estimator (MLE). It is well 
known that, if the Ci's are equal, the estimators of the mean and the variance 
factor given by maximization of the likelihood are independent. This fact does 
not remain true if the ct^ 's are not constant. To recover the independence between 
the estimators of the mean and the variance, we construct them separately from 
the two independent copies yl^l and F^^l. For the estimator Sm of s, we take 
the MLE based on yl^l and for the estimator Um of ct, we take the MLE based 
on yl^l. Thus, for each to S AJ, we have a pair of independent estimators 
{srm^m) = (sm(5^ ), <5'm(^ )) with values in Sm X E,„. The Kullhack risk of 
(sm, CTm) is given by 'E,[lC{Ps,a, Ps„,a-,„)] and is of order of the sum of two terms, 

inf 1C{Ps,a,Pt,r)+Dm. (1.2) 

(t,T)GS™xS„ 
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The first one, called the bias term, represents the capacity of Sm x "^m to 
approximate the true value of {s,a). The second, called the variance term, is 
proportional to the dimension of the model and corresponds to the amount 
of noise that we have to control. To warrant a small risk, these two terms 
have to be small simultaneously. Indeed, using the KuUback risk as a quality 
criterion, a good model is one minimizing (1.2) among J^. Clearly, the choice 
of a such model depends on the pair of the unknown parameters (s, a) and 
make good models unavailable to us. So, we have to construct a procedure to 
select an index to = m{Y^^\Y^^^) £ M. depending on the data only, such that 
E[/C(Ps_cr,-Psrfi,a-rfi)] is close to the smaller risk 

R{s,a,T)= inf ¥.[]C{Ps,a,Ps^,aJ]. 
ni£M 

The art of m,odel selection is precisely to provide procedure solely based on the 
observations in that way. The classical way consists in minimizing an empirical 
penalized criterion stochastically close to the risk. Considering the likelihood 
function with respect to Y^^^ , 

1 " (Y^^^-tY 
VfeIR",rG(0,oor, Cit,T)^-Y,— ^+logT. , 

we choose to as the minimizer over A4 of the penalized likelihood criterion 

Crit(77i) == £(s„i, (T„i) + pcn(?7i) (1.3) 

where pen is a penalty function mapping M into K._|- ~ [0, oo). In this work, we 
give a form for the penalty in such a way to obtain a pair of estimators (sm, o'm) 
with a KuUback risk close to R{s, a, T). 

Our approach is free of any prior assumption on s but requires that we know 
some upper bound 7 ^ 1 on the ratio 

a* jo^ ^ 7 

where a* (resp. cr*) is the maximum (resp. minimum) of the ctj's. The knowledge 
of 7 allows us to deal equivalently with two different cases. First, "7 = 1" 
corresponds to the homoscedastic case where the components of Y^'^' and Y^'^' 
are independent with a common variance (i.e. ai = a) which can be unknown. 
On the other side, "7 > 1" means that the cTi's can be distinct and are allowed to 
vary within some range. This uncommonness of the variances of the observations 
is known as the heteroscedastic case. Heteroscedasticity arises in many practical 
situations in which the assumption that the variances of the data are equal is 
debatable. 

The research field of the model selection has known an important develop- 
ment in the last decades and it is beyond the scope of this paper to make an 
exhaustive historical review of the domain. The interested reader could find 
a good introduction to model selection in the first chapters of [17]. The first 
heuristics in the domain are due to Mallows [1 ( il for the estimation of the mean 
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in honioscedastic Gaussian regression with known variance. In more general 
Gaussian framework with common known variance, Barron et al. [7] , Birge and 
Massart ([9] and [10]) have designed an adaptive model selection procedure to es- 
timate the mean for quadratic risk. They provide non-asymptotic upper bound 
for the risk of the selected estimator. For bound of order of the smaller risk 
among the collection of models, this kind of result is called oracle inequalities. 
Baraud [5] has generalized their results to homoscedastic statistical models with 
non-Gaussian noise admitting moment of order larger than 2 and a known vari- 
ance. All these results remain true for common unknown variance if some upper 
bound on it is supposed to be known. Of course, the bigger is this bound, the 
worst are the results. Assuming that 7 is known does not imply the knowledge 
of a such upper bound. 

In the homoscedastic Gaussian framework with unknown variance, Akaike has 
proposed penalties for estimating the mean for quadratic risk (see [1, 2] and [3]). 
Replacing the variance by a particular estimator in his penalty term, Baraud 
[5] has obtained oracle inequalities for more general noise than Gaussian and 
polynomial collection of models. Recently, Baraud, Giraud and Huet [6] have 
constructed penalties able to take into account the complexity of the collection of 
models for estimating the mean with quadratic risk in Gaussian homoscedastic 
model with unknown variance. They have also proved results for the estimation 
of the mean and the variance factor with KuUback risk. This problem is close to 
ours and corresponds to the case "7 = 1" . A motivation for the present work was 
to extend their results to the heteroscedastic case "7 > 1" in order to get oracle 
inequalities by minimization of penalized criterion as (1.3). Assuming that the 
collection of models is not too large, we obtain inequalities with the same flavor 
up to a logarithmic factor 

E[/C(P,,,,P5„,^,^,J] 

^ C inf I inf /C (Ps.a^Pt.r) + A„ log^+^ D„A + R (1.4) 

where C and R arc positive constants depending in particular on 7 and e is a 
positive parameter. 

A non-asymptotic model selection approach for estimation problem in het- 
eroscedastic Gaussian model was studied in few papers only. In the chapter 6 
of [4] , Arlot estimates the mean in heteroscedastic regression framework but for 
bounded data. For polynomial collection of models, he uses resampling penalties 
to get oracle inequalities for quadratic risk. Recently, Galtchouk and Pergamcn- 
shchikov [] 4] have provided an adaptive nonparametric estimation procedure for 
the mean in a heteroscedastic Gaussian regression model. They obtain an oracle 
inequality for the quadratic risk under some regularity assumptions. Closer to 
our problem, Comte and Rozenholc [12] have estimated the pair {s,a). Their 
estimation procedure is different from ours and it makes the theoretical results 
difficultly comparable between us. For instance, they proceed in two steps (one 
for the mean and one for the variance) and they give risk bounds separately 
for each parameter in L2-norm while we estimate directly the pair (s, a) for 
KuUback risk. 
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As described in [X], one of the main advantages of inequalities such as (1.4) 
is that they allow us to derive uniform convergence rates for the risk of the 
selected estimator over many classes of smoothness. Considering a collection of 
histogram models, we provide convergence rates over Holderian balls. Indeed, 
for ai,a2 G (0, 1], if s is ai-H61derian and a is a2-H61derian, we prove that the 
risk of {srn,^m) couvcrgcs with a rate of order of 

-2a/{2a+l) 



log^+' n 

where a = min{ai,a2} is the worst regularity. To compare this rate, we can 
think of the homoscedastic case with only one observation of Y. Indeed, in this 
case, the optimal rate of convergence in the minimax sense is 7^-2a/(2Q!+i) ^nd, 
up to a logarithmic loss, our rate is comparable to this one. To our knowl- 
edge, our results in non-asymptotic estimation of the mean and the variance in 
heteroscedastic Gaussian model are new. 

The paper is organized as follows. The main results are presented in section 2. 
In section 3, we carry out a simulation study in order to illustrate the perfor- 
mances of our estimators in practice with the KuUback risk and the quadratic 
risk. The last sections are devoted to the proofs and to some technical results. 

2. Main results 

In a first time, we introduce the collection of models, the estimators and the 
procedure. Next, we present the main results whose proofs can be found in the 
section 4. In the sequel, we consider the framework (1.1) and, for the sake of 
simplicity, we suppose that there exists an integer A:„ ^ such that n — 2*^" . 

2.1. Model collection and estimators 

In order to estimate the mean and the variance, we consider linear subspaces of 
R" constructed as follows. Let A^ be a countable or finite set. To each m G A4, 
we associate a regular partition pm of {1, . . . ,2*-'"} given by the \p,n\ = 2*^™ 
consecutive blocks 

{(i-l)2'^"-'^'" + l,...,i2'="-^'"}, z = l,...,K| . 

For any / G Pm and any x <E M", let us denote by x\i the vector of M"/I?''"l with 
coordinates {xi)i^i. Then, to each to e A^, we also associate a linear subspace 
Em of M"/!?"! with dimension 1 ^ dm ^ 2'"'""'^™. This set of pairs {pm, Em) 
allows us to construct a collection of models. Hereafter, we identify each m G A4 
to its corresponding pair {pm, Em)- 

For any m = {pm,Em) & M, we introduce the subspace Sm C M" of the 
-Em-piecewise vectors, 

Sm = {x e M" such that V/ G Pm, x\i e Em} , 
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and the subspacc S^ C M" of the picccwisc constant vectors, 



T,m = \Y^ giti, V/ e p,n, gi em . 



The dhiiension of Sm x ^m is denoted by D„i = \vm\{dm + !)• To estimate the 
pair (s, (t), we only deal with models Sm x '^m constructed in a such way. More 
precisely, we consider a collection of products of linear subspaces 

T^iS^^Y..^, meTW} (2.1) 

where A^ is a set of pairs (pni, ^-m) as above. In the paper, we will often make 
the following hypothesis on the collection of models: 

(Hg) There exists 9 > \ such that 

Vm € TW, n ^ - — -(7 + 2)An ■ 

u — i 

This hypothesis avoids handling models with dimension too great with respect 
to the number of observations. 

Let 771 S A^, we denote by tt^ the orthogonal projection on S',„. We estimate 
(s, a) by the pair of independent estimators («„, a^) G Sm x S^ given by 

and 

<3-,n = ^ o-,„_/l/ where V/ G p™, (T™,/ = TTT X!(^i ^ (TTmi^ 

Thus, we get a collection of estimators {(sm, CTm), rn. G A^}. 

2.2. iJisfe upper bound 

We first study the risk on a single model to understand its order. Take an 
arbitrary m G M.. We define (s,„, am) G S'm x S],„ by 

and 

o-m = ^ cr^jl/ where V/ G p™, cr„j = — ^(sj - s,„,j)^ + CTj . 
7epm ' ' iei 

Easy computations proves that the pair (sm,(7m) reaches the minimum of the 
Kullback-Leiblcr divergence on Sm x Em, 

inf ICiPs,a,Pt,r) = IC{Ps.a,Ps„,,aJ 



a '" 



jEE'-sf^)- (^-^) 



2 ^^ ^^ V cr. 
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The next proposition allows us to compare this quantity with the KuUback risk 

of (s„i,(7m). 

Proposition 1. Let m G Ai, if the hypothesis (Hg) is fulfilled, then 

where k > 1 is a constant that can be taken equal to 1 + 2e~^. 

As announced in (1.2), this result shows that the KuUback risk of the pair 
{sm,<^m) is of order of the sum of a bias term JC{Ps,a,Psm,(Jr„) and a vari- 
ance term which is proportional to Dm- Thus, minimizing the KuUback risk 
E [/C {Ps,(j, Psm,^m)] among m G A4 corresponds to finding a model that realizes 
a trade-off between these two terms. 

Let pen be a non negative function on A4, rfi (z A4 is any minimizcr of the 
penalized criterion 

TO S argmin{£ (sm,(Tm) + pcn(m)} . (2.3) 

mSA-l 

In the sequel, we denote by (s,ct) = {sm,^m) the selected pair of estimators. It 
satisfies the following result: 

Theorem 2. Under the hypothesis (Hg), suppose there exist A, B > such 
that, for any (k, d) £ N^ , 

Mk,d = Card[m G M such that \pm\ = 2^ and dm = rf} ^ ^(1 + d)^ (2.4) 

where Ai is the set defined at the beginning of the section 2.1. Moreover, assume 
that there exist 5,e > Q such that 



5Sjn 
log^+' n 

If we take 



Dm^ , il, ,ymeM. (2.5) 



Vm e M, pen{m) = {-,6 + log^+' An) Dm (2.6) 

then 

E[/C(P,,,,Ps.s)]^C inf {lC{P,.^,P,^.^J+Drnlog^+'Dm}+R (2.7) 

meM 

where R = R{'j, 9, A, B, e, S) is a positive constant and C can be taken equal to 

V iog^+^2 ; 

The inequality (2.7) is close to an oracle inequality up to a logarithmic fac- 
tor. Thus, considering the penalty (2.6) whose order is slightly larger than the 
dimension of the model, the risk of the estimator provided by the criterion (1.3) 
is comparable to the minimum among the collection of models T. 
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2.3. Convergence rate 

One of the main advantages of an inequality as (2.7) is that it gives uniform 
convergence rates with respect to many well known classes of smoothness. To 
illustrate this, we consider the particular case of the regression on a fixed design. 
For example, in the framework (1.1), we suppose that 

VI ^ i ^ n, Si ~ Sr(i/n) and <Ti = ar{i/n), 

where Sr and ar are two unknown functions that map [0, 1] to R. 

In this section, we handle the normalized KuUback-Lciblcr divergence 

1^71 {Ps,(7, Pt^T) ~ —K^ {Ps,(7, Pt^TJ , 
n 

and, for any a E (0, 1) and any L > 0, we denote by T-ia{L) the space of the 
a-H61derian functions with constant L on [0, 1], 

n^{L) = {/ : [0, 1] ^ R : Va-, y G [0, 1], \f{x) - /(y)| < L\x - yj"} . 

Moreover, we consider a collection of models T^^ as described in the section 2.1 
such that, for any m g Al , Em is the space of dyadic piecewise constant functions 
on dm blocks. More precisely, let m = {pm,Em) S M and consider the regular 
dyadic partition p^^ with |pmMm blocks that is a refinement of p^. We define 
Sm as the space of the piecewise constant functions on p^ , 

Sm ={f=Yl ^^^ ^^^^ *1^^* ^^ ^ P'"' fl^A ' 

and Sm as the space of the piecewise constant functions on pm-, 
Y.m ^ <g ^ ^ giti such that VI e Pm, gi ^ 

Then, the collection of models that we consider is 

^^^ = {Sm X S™, meM} . 

Note that this collection satisfies (2.4) with A = 1 and B = Q. The following 
result gives a uniform convergence rate for (s, a) over Holderian balls. 

Propositions. Letai,a2 G (0,1], Li, L2 > and assume that (Hg) is fulfilled. 
Consider the collection of models T^ and 5,e > such that, for any m G A4, 

5Sjn 



log^+' n 



Denoting by (s, a) the estimator selected via the penalty (2.6), if n satisfies 
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then 

-2q/(2q+1) 

n \ 

(s,..<T.)e-H„i(ii)x-H„2(L2) \log n 



where a ~ iiiin{Q;i, a2} and C is a constant which depends on ai, a2, Li, L2, 
9, J, a^, S and e. 

For the estimation of the mean s in quadratic risk with one observation of Y , 
Gahchouk and Pergamenshchikov [14] have computed the heteroscedastic min- 
imax risk. Under some assumptions on the regularity of ar and assuming that 
Sr € HaiiLi), they show that the order of the optimal rate of convergence in 
minimax sense is Cai,crn~'^°"-^^'^"'-~^^\ Concerning the estimation of the variance 
vector a in quadratic risk with one observation of Y and unknown mean, Wang 
et al. [19] have proved that the order of the minimax rate of convergence for the 
estimation of a is Cqi^qj max {n~^"\ n"^"^/^^"^"'"^)} once Sr & Ha^^Li) and 
Ur € 'Ha2iL2)- For ai,a2 € (0, 1] the maximum of these two rates is of order 
^-2q/(2q+i) ^]-^gj.g Q, _ niin{ai,Q;2} is the worst among the regularities of Sr 
and ar- Up to a logarithmic term, the rate of convergence over Holderian balls 
given by our procedure recover this rate for the KuUback risk. 

3. Simulation study 

To illustrate our results, wc consider the following pairs of functions (s^iCrj.) 
defined on [0, 1] and, for each one, wc precise the true value of 7: 




4 if < X < 1/4 

if 1/4 s; 2; < 1/ 
2 if 1/2 :^ a; < 3/ 

1 if 3/4 ^ X s; 1 



ifl/4^x<l/2 ,, ^.^ [2 if0<x<l/2 
2 if 1/2 s: a; < 3/4 -^na a^^x; - <; ^ if 1/2 ==: x sC 1 



M2 (7 = 1) 



Sr{x) = 1 + sin(27rx + 7r/3) and ar{x) = 1 , 

• M3 (7 = 7/3) 

Sr{x) = 3x/2 and ar^x) = 1/2 + 2sin(47r(x A l/2)2)/3 , 

• M4 (7 = 2) 

Sr{x) = 1 + sin(47r(x A 1/2)) and a-r{x) = (3 + sin(27rx))/2 . 

In all this section, we consider the collection of models !F^'~^ and we take 
n = 1024 {i.e. kn = 10). Let us first present how our procedure performs on 
the examples with the true value of 7 for each simulation, e = 10^^ and 5 = 3 
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Fig 1: Estimation on the mean (left) and the variance (right) in the case Ml. 
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Fig 2: Estimation on the mean (left) and the variance (right) in the case M2. 



in the assumption (2.5) and the penalty (2.6) with 6 = 2. The estimators are 
drawn in plain line and the true functions in dotted line. 

In the case of Ml, we can note that the procedure choose the "good" model 
in the sense that if the pair {sr,(Jr) belongs to a model of T^'~' , this one is 
generally chosen by our procedure. Repeating the simulation 100 000 times with 
the framework of Ml gives us that, with probability higher than 99.9%, the 
probabihty for making this "good" choice is about 0.9978 (±4 x 10^"*). Even 
if the mean does not belong to one of the Sm's, the procedure recover the 
homoscedastic nature of the observations in the case M2. By doing 100 000 
simulations with the framework induced by M2, the probability to choose an 
homoscedastic model is around 0.99996 (±1 x 10~^) with a confidence of 99.9%. 
For more general framework as M3 and M4, the estimators perform visually 
well and detect the changements in the behaviour of the mean and the variance 
functions. 

The parameter 7 is supposed to be known and is present in the definition 
of the penalty (2.6). So, we naturally can ask what is its importance in the 
procedure. In particular, what happens if we do not have the good value? The 
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Fig 3: Estimation on the mean (left) and the variance (right) in the case M3. 
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Fig 4: Estimation on the mean (left) and the variance (right) in the case M4. 



following table present some estimations of the ratio 



E[/C(P,.,,P,,^)]/ inf E[IC{Ps.a,Ps^ 

rn£A4 



J] 



for several values of 7. These estimated values have been obtained with 500 
repetitions for each one. The main part of the computation time is devoted to 
the estimation of the oracle's risk. In the cases Ml, M3 and M4, the ratio does 
not suffer to much from small errors on the knowledge of 7. The more affected 
case is the homoscedastic one but we see that the best estimation is obtained 
for the good value of 7 as we could expect. More generally, it is interesting to 
observe that, even if there is a small error on the value of 7, the ratio stays 
reasonably small. 

In the regression framework with hctcrosccdastic noise, we can be interested 
in separate estimations of the mean and the variance functions. Because our 
procedure provide a simultaneous estimation of these two functions, we can ask 
how perform our estimators s and a individually. Considering the quadratic risks 
E [||s — s|p] and E [Wcr — (t|P] of s and a respectively, it could be interesting to 
compare them to the minimal quadratic risk among the collection of estimators. 
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7 


1 


1.5 


2 


2.5 


3 


Ml 


0.98 


1.02 


1.02 


1.04 


1.01 


M2 


1.49 


1.59 


1.88 


2.29 


2.89 


M3 


1.77 


1.78 


1.81 


1.90 


1.94 


M4 


1.25 


1.26 


1.27 


1.32 


1.33 



Tabic 1: Ratio between the KuUbaek risk of (s, a) and the one of the oracle 



7 


1 


1.5 


2 


2.5 


3 


Ml 


0.98 


1.01 


0.95 


1.04 


0.98 


M2 


1.52 


1.67 


2.04 


2.43 


3.04 


M3 


1.73 


1.76 


1.82 


1.88 


1.96 


M4 


1.47 


1.48 


1.47 


1.47 


1.49 



Table 2: Ratio between the L2-i'isk of s and the minimal one among the Sm's 



7 


1 


1.5 


2 


2.5 


3 


Ml 


1.00 


1.06 


1.03 


1.02 


1.01 


M2 


1.11 


1.56 


1.68 


2.21 


3.36 


M3 


2.02 


2.07 


2.13 


2.20 


2.23 


M4 


1.18 


1.37 


1.34 


1.44 


1.49 



Table 3: Ratio between the L2-i'isk of a and the minimal one among the (Tm's 



To illustrate this, we give below two sets of estimations of the ratios 

E[\\s-sf]/ inf E[||s-s,„||21 and E[\\a-af]/ inf E[\\a-&.^f] 

in the frameworks presented in the beginning of this section. We can observe 
on the following estimations that the quadratic risks of our estimators are quite 
close to the minimal ones among the collection of models. 



4. Proofs 

For any / C {1, . . . , n} and any x,y € M", we introduce the notations 



(x. 



y)i = j: 



^ilji 



and M]^Y1 



Xi 



Let 771 e A^, we will use several times in the proofs the fact that, for any I Cz p 

,2 



\I\amj^a,xi\I\-d,n-l) 



(4.1) 



where X^d-^l ^ '^m — 1) is a x^ random variable with |/| — dm — 1 degrees of 
freedom. 
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4.1- Proof of the proposition 1 

Recalling (2.2) and using the independence between Sm and am, we expand the 
KuUback risk of (!,„, am), 



E[/C(P,,,,Ps„ 






^i ^^ni.i ) 



lep 
1 

^2 



am, I 



^[\\S-Sm\\j] 



f^mj 



at 



(4.2) 



EE^ 
lepm iei 



log 



CTrnJ , CT. 



^-1 



= JC{Ps,a,Ps. 



(^m.I 0-m,I 



E 1^1^ 

iep,„ 



log 



Cm./ 



a,. 



<7m.I 
<^m,I 



^EE^ 

/epm iei 



E 

l€pm 



ai + [si - s,„_i)2 - cr. 



mJ 



O'mJ 



E 



fm,/ 

Itt r^/^ ji]||2' 



where 



El = ^ E i^iE 



/SPn 



CTmJ 



fm,/ 



/C(P,,,,P,„,,„)+Ei+E2 



and 



/epn 



<7m,I 



(4.3) 

/ ^ ^ni,i,i^i ■ 



iG/ 



To upper bound the first expectation, note that 

V/ e Pm, E[(7„i,7] = a„ij ^ TTT E '^™.^«'^« = 0-,„,/(l - pi) 

' ' jG/ 

where 

Pi = 771 y^7i"m,i.iCTi e (0, 1) . 

\I\<JrnJ -^ 

We apply the lemmas f and f f to each block / e pm and, by concavity of the 
logarithm, we get 



E 



O'm,/ 
Cm,/ 



< logE 



< log(l-p/) + 



Cm,/ 
.Cm,/. 


+ E 


O'mJ 
.Cm J 



- 1 



1 



1 



2^7^ 



< -Pi + 



^-Pi V ' \I\ - dm - 2 
1 /, . 2^72 



- 1 



€ 



I -PI 



I- PI 



pV 



\I\ - dm ~ 2 
|/| - rfm - 2 



1 
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Using (Hg) and the fact that pj ^ jdm/lll; wc obtain 

2^72 



1358 



El ^ 



PI 



<c 



<; 



l£Pm 

1 >-^ _-f^dl 
2^1/ 



p? 



\I\-d,n-2^ 

2«7'|/p 



lep 






(4.4) 



The second expectation in (4.3) is easier to upper bound by using (4.1) and the 
fact that dm > 1, 



$: 



<; 



2 ^^ 

-y- 



1 



l\I\dm 



iei 



l€Pr, 

1&\Pni\d„ 



Um ^ 



(4.5) 



We now sum (4.4) and (4.5) to obtain 

El +E2 < -/^e\p.m\d,n + K7'^'brn| ^S l^l^O^D^ . 

For the lower bound, the positivity of in (4.2) and the independence between 
Sm and am give us 



l[ICiPs^a,Ps. 



> 



lE- 



\s — s 



mil/ 



O^rn,/ 



1 ^ E[j|s-s,„||2] 



^ 2 A^ E [a,nj] 



iep„ 



> 



\y\^\ 



*m||/ 



^ ' o^d^. 



2^^^J^'||s-s™||2 + (|/|-dm)a* 



It is obvious that the hypothesis (Hg) ensures dm ^ 1-^1/2. Thus, we get i7*(im ^ 
(|/| - dm)(y* and 



E[/C(P,,.,P,„,^„)]^i ^ -J 



^|0-*(i„j bm|dm . D 



> 



> 



(|/| - dm)o* " 27 '' 47 ■ 

To conclude, we know that (s,„, (t,„) G S*™ x I],„ and, by definition of (s^, ct^), 
it implies 

E[/C(P,,,,P,„,^„)] ^/C(P3,,,P,„,,„) . 
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4-2. Proof of theorem 2 

We prove the following more general result: 

Theorem 4. Let a G (0, 1) and consider a collection of positive weights 
{xm}meM- If the hypothesis (He) is fulfilled and if 

Vm e M, pen{m) ^ ^OD,n + Xm , (4.6) 

then 



=^ inf {e[a: 

m£M 

where Ri{M.) and R2{M.) are defined by 



< inf {E[ICiPs.a,Ps^.aJ]+pen{m)} + Ri{M) + R2{M) 

meM 



/ o , \ L21og(l+dm)J 



meM 

and 



2{a + ^9) + 1 sr-^ f n f a\pm\xm \\ 

In these expressions, [-J is the integral part and C is a positive constant that 
could be taken equal to 12v2e/(y^— 1). 

Before proving this result, let us see how it implies the theorem 2. The choice 
(2.6) for the penalty function corresponds to Xm = Um log Dm in (4.6). 
Applying the previous theorem with a = 1/2 leads us to 

E[/C(P,,<„P,,^)] 

<2 inf {E[IC{Ps,a,Ps^.aJ]+pcn{m)} + 2Ce'^-fRi+8{j9 + l)R2 

m£M 

with 

C, \ l21og(l+d,„)J 
2Ce^-f^/\^\dml0g{l+dm) \ 
I 
Xfri I 

and 

?^ , /, , \Pm\Xr. 



R2= J2 l^"l°^p -^^;n^°gU 



Using the upper bound on the risk of the proposition 1, we easily obtain the 
coefficient of the infimum in (2.7). Thus, it remains to prove that the two quan- 
tities Pi and P2 can be upper bounded independently of n. For this, we denote 
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by B' = B + 2log{2C9^j) + 1 and wc compute 

L21og(l+d„)J 
--^ , / 7f /W'o'^/lTJ^Irf.,, loffl I + rt™ 1 \ 

dm)) J 

L2 1og(l+d)J 



^ T.Y. Mk,d2'/'d (2Ce^^2-^''' ^og{l + d) 



(fclog2 + log(l + d))'+' 



kw \{k\og2 + \og{l + d)f^^) 

^ A{R[ + R'l) . 

We have split the sum in two terms, the first one is for d = 1, 

2^' log 2 2^' ^ 1 

^'^ =|^„ (fclog2 + log2)^+^ ^ 1^?2 S (^ + 1)'^' ^ ^ ■ 

The other part i?" is for d ^ 2 and is equal to 

/ 1 n ^^ \ L2i°g(i+'i)J 

V V(l + rf)S'2-fe(L21og(l+d)J-l)/2 [ ^°S'-^ + '^> I 

^^0^2 V(fcl°g2 + log(l + d))^+V 

Noting that 1 < log(l + d) s^ [2 log(l + d)\ , we have 

R'i ^ ^2-'=/2^(l + d)^'exp(-eL21og(l + d)Jloglog(l + rf)) 

fe^O d'^2 

V2 



V 2 — 1 ,^„ 



d>2 



We now handle i?2- Owe choice of Xm ~ Dm log """"^ -D,n and the hypothesis (2.5) 
imply 

\Pm\Xm . ., I l~{5\pm\+iy^ 

57n (d|p,„| + l) ^ 

We recall that, for any a e (0, 1), if < i < (1 - a)/a, then log(l +i) ^ af. Take 
a = (i5|prn| + 1)""'^ to obtain 

1 111 \Pm\Xm, \ ^ \Pm\Xm ^ ''"— 

log IH 1 > T7T:i r— ^^ — ^ 



5771 / 5((5|p,„| + 1)771 5 ((5 + 1)777 
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For any positive t, 1 + t^+' ^ (1 + t)^^'^, then we finally obtain 



< 



^ b,„|exp 



. 10^7(^+l)b. 



^ EE^m2 exp loM^+i) 



where we have set 

56*7(5 + 1 



«i = E-K-S^)<" 



fc>0 

and 

We now have to prove theorem 4. For an arbitrary tti G A^, we begin the 
proof by expanding the KuUback-Leibler divergence of (s, a), 

5.^2 



l — l ^ ^ 



= /C(P,,,, P,„,^,J + [C{s„„ (t™) - /C(P,,,, P,,„,^„)] 

+ [£(J,ct) - £(!,„, <T„0] + [JC{Ps,a:P~s,a) " £(5, ct)] . 

By the definition (2.3) of ?n, the inequality 

£(s, 0-) - C{sm,crm) s% pcn(m) - pcn(m) (4.7) 

is true for any m ^ Ai. The difference between the divergence and the likelihood 
can be expressed as 



o/ - \ , — [1] 1 " 

% ol^i^i +l°g^' 



i=l 
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Using (4.7) and (4.8), for any a E (0, 1), we can write 

s; IC{Ps,a,Ps^,aJ + pcn(TO) + G{m) 

+ Wi{7h) + W2{m) + Z{m) - pcn(m) 
where, for any m G A4, 



(4.9) 



lepr. 



Um.I 






3m ^ '=11/ I 5 



lep 



2^ J / cr„i / 



and 

We split the proof of theorem 4 in several lemmas. 

Lemma 5. For any m E A4, we have 

E[G(m)] < . 

Proof. Let us compute this expectation to obtain the inequality. By indepen- 
dence between el^l and e^^l, wc get 



E 



G{m) 



ep™ 



iep„ 



O'mJ 






It leads to E[G(m)] = E [E [G{m) jel^l ]] <; 0. D 

In order to control Z{m), we split it in two terms that wc study separately, 

Zijn) = Z^{in) + Z^[m) 
where 



-.(".) ^ i E E (e -X (' -^!' 






and 



lep,^ iei \ ^ ™'-' 



1 er -1 -a. 



O^m,/ 



O", 



1(T,„ j->cri 
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Lemma 6. Let m ^ M. and x be a positive number. Under the hypothesis (He), 
we get 



E[{Z+{m)-x)^\ ^ — - — exp ( ^^^-—^ log ( 1 



a " \ 2\p,n\ 

Proof. Wc begin by setting, for all 1 ^ J ^ n, 



Ti{m) 



(E"=1 (f^j/'^mj- - 1) 



1/2 



and we denote by 



5(m)=^T,(m)(l-4i 



4=1 



Wc lower bound the function (p by the remark 

Vae (0,1), Vug [a,l], ( --1 I < -0(w) 
\u J a 

Thus, we obtain 



2a\p,n\x 



771 



E(3^-0 ^^fes^lEH^ 



_|_ V '^" ^m,j 



la. 



and we use this inequality to get 



4 = 1 



2M{m) 



2 \ 1/2 n , ^ . 

11 ^m-?E4^)i^-. 



(Jl 






i=i 



^ 



■ir 



max ■ 



4a \ i^n CT,] 



5(m)^ 



To control S{m), we use the inequality (4.2) in [15], conditionally to gl^l. Let 
■u > 0, 



f^..,. ^» ' 


) ^(m)^ ^ u) 


= 


E 


V «^" CTm^iy 






^ 


E 



S{m) ^ , /w/max 



i$Cn (Jjn 



exp 



^ . ^m,i 

■ — mm 

4 i^n (J^ 



By the remark (4.1), we can upper bound it by 



. 



\ *^^ ^m,i 



S{m)l ^u] s^ E 



exp min Xr 

' 47 Iep„^ 



X. Gendre/Simultaneous estimation of the mean and the variance 



1364 



where the X/'s arc i.i.d. random variables with a x^ (|/| — d,„ — 1) /|/| distri- 



bution 



For any A > 0, we know that the Laplace transform of Xj is given by 



2A 



-(|/|-d„-l)/2 



Let i > 0, the following expectation is dominated by 



(4.10) 



E 



(z+(m) 



2a / + 



= 


/•OO 

/ P 

Jo 


( it \ 

Z+ (m) ^ 'ru] du 

K 2a J 


s^ 


Jo 


cxp — h - mm Xi 

V V 7 2/ /Gp„ J 


<^ 


/•OO 

/ ^ 

Jo 


( fau t\ V 
max exp — h - Xj \ 

lep^ V V 7 2; ;_ 



Using (He) and (4.10), we roughly upper bound the maximum by the sum of 
the Laplace transforms and we get 



E 



(z+(m) 



2a 



< 



V- l\I\ 

^^^a(|/|-d,„-3) 



'^\T\ 



|/|-d,„-3)/2 



< 



7^bm| / n - {d,y^ + 'i)\pm\ , A 



ib« 



Take t = 2ax/^ to conclude. 

Lemma 7. Lei to G M and x be a positive number, then 

E \{Z(m) - (2a + l)x)A ==: ^^i±le-"^ 

Proof. Note that for all m > 1, we have 



D 



2dp{u) ^ - - 1 



Let t > 0, we handle Z^{m) conditionally to e^^l and, using the previous lower 
bound on 0, we obtain 

2a + 1 
Z- (m) > ^ 1 



2a 



< 



v^SU™ 



- 1 



2a + 1 
~2^ 



n 



e;^l'-lU^^t+-E(^- 



4 ^ V*nJ:i 



r[2] 
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Itil 



1=1 

Let us note that 



1 $si 



- \^ ( ^' 



.[2] 



1 ^1 



thus, we can apply the inequality (4.1) from [I't] to get 



. s 2q: + 1 , 
Z_(m) ^ ^- 1 ^ exp(-t/2) 



2a 



This inequality leads us to 

^ , , s 2a + 1 
E (Z_(m) ^t 



-\-OQ 

^ I P(Z_(m) ^ u)du 

' (2a + l)t/a 



< 



2a + 1 



Take t = ax to get the announced result. 



D 



It remains to control Wi{m) and W2{m). For the first one, we now prove a 
Rosenthal-type inequality. 

Lemma 8. Consider any m £ A4. Under the hypothesis (Ho), for any a; > 0, 
we have 



E[(Wi(m)-70A, 



^ Cd 7V|Pm|am 



[2 1og(l+d„)J 



where [-J is t/ie integral part and C is a positive constant that could he taken 
equal to 

C = i^« 43.131. 



/^-l 



Proof. Using the lemma 10 and the remark (4.1), we dominate Wi{m 

\I\drn ^ _ ind, 

< \n- 



W,{m) ^W[{m)^^Y. \l\ r-l ^'' 



1 ?l- |Pm|(l + rfm) 



E^^ 



/ep„ 



where the i^/'s are i.i.d. Fisher random variables of parameters (rfm, 'T^/|Pm| 
dm — 1)- We denote by F,„ the distribution of the i^/'s and we have 

\Dra < 7b™Mm ^ ^W'M)\ ^ 7^bmMm ^ 7^-D„ . 



Take a; > and an integer g > 1, then 



E[(M^{(m)-E[W^{(m)]-x)J < 



E[(VF{(m)-E[M^{(77i)]) 



(g - l)a;9-i 



(4.11) 
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Wc set V ~ W{{m) ~ ¥J\W[{m)\. It is the sum of the independent centered 
random variables 



X,^ 



"fnd„ 



-iFi~E[Fi]), lepr, 



n~ \p„i\{l + dm) 
To dominate E [Vf] , we use the theorem 9 in [11]. Let us compute 






(n - \Pm\{d,n + 3))2(n - \pm\{d,n + 5)) 



< 2Y0'\PmK 



and so, 



E [Vf] '"^ < ^yi2K'j^0^\p„,\d,nq + qKy/2E max \Xi\'' 



l/« 



where k' - 2(75=1) ■ 

We consider q ^ 1 + [21og(l + d„i)\ where [-J is the integral part. For this 
choice, q ^ 1 + dm and it implies 

2\pjn\q <n- \p„i\{l + dm) ■ 

The hypothesis (He) allows us to make a such choice. We roughly upper bound 
the maximum by the sum and we use (Hg) to get 



E 



maxlX/l"? s^ {jddm)'''^ max |F/ - E[i^/ 



lep 



l£p,i 



^ i^Odm)" 2^-^ (E[F™]'J + b™|E[F,?J) 



€ 



{2l9^dmY 

2 
q2 , ^9 



br, 



{2-fedm){l + 2{q-l)/dm) 
1 - 2\pm\q/{n - |p„i|(l + dm)) 



^ (676'^d,„) \pm\ ■ 
Thus, it gives 



< QK'V2-ie'^ (y/\pm\dmq+\Pm\^'''dmq 



< l2K'V2-ie^^/\^\dm (1 + L21og(l + dm)\) 

Injecting this inequality in (4.11) leads to 
E [{W[{m) - E[W[{m)] - x)^] 



^ Cj9^V\P^\d. 



Cl9^V\P^\dm (1 + 2l0g(l + dm)) 



L21og(l+d„)J 



2x 



n 
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Lemma 9. Consider any m ^ M. and let x be a positive number. Under the 
hypothesis (He), we have 



[(W^2(m)-x)J<:^^^^exp 



71 - (d,„ + 3)|pm| / 2a\p,n\x 



2\p„ 



jn 



Proof. Let us define 



Aim) = J2 



'mil/ 



/epn 



<^mj 



The distribution of W2{m) conditionally to e'^l is Gaussian with mean equal to 
—aA(m)/2 and variance factor 



E 



|pl/2/ ^||2 

|i (J [S Sjn) II J 



let 



If C is a standard Gaussian random variable, it is well known that, for any A > 0, 

P(C ^ \/2A) s^ e"^ . (4.12) 

We apply the Gaussian inequality (4.12) to W2{m) conditionally to e'-'^', 



Vi>0, P I W2im) + -A{m) ^ 



\ 



2*E 

lepm 



tJ {s - s,, 



^ er 



It leads to 



^2(jn) + ^A(m) ^ ,/2iA(m)max-:^ 

2 V i^n CTm. 



:[21 1 € e- 



and thus, by the remark (4.1), 



^t 
W^im) ^ — maxXr 
a. /Gp„ -* 



rl^M scp(w^2(m) ^ 



— max - — 



^ e" 



a i^n am,i 

where the X/'s are i.i.d. random variables with a x^ i\I\ ^ dm — 1) /\I\ distri- 
bution. Finally, we integrate following e^^l and we get 

at 



P(W2(m) ^f) s;e 

We finish as we did for Z+{m), 

W2{m)-^t 
Za 



max exp 

-fepm 



-X, 



7 



< 



< 



+ 00 



E 





a 



max exp 

I&Pm 



au t 






7 2 

^ X -(|7|-d,„-3)/2 



X/ 



du 



^ 7^bm| 

^ exp 



n~ {dm + 3)|pto| 

2\pm\ 



log 1 + 



ib»i 
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D 

In order to end the proof of theorem 4, we need to put together the resuhs of 
the previous lemmas. Because 7 J^ 1, for any x > 0, we can write 



^ exp 



" .loJl+^^'P-'" 



2b„ 



771 



We now come back to (4.9) and we apply the preceding results to each model. 
Let m G M, we take 



2(2 + a) 
and, recalling (4.6), we get the following inequalities 

(l-a)E[/C(P,,,,Pj.^)] 



^ E[IC{Ps,a,Ps^,aJ] + pen(m) + E 
W2{m) 



Wiim)--f0D,n 



2(2 + a) 



-E 



-E 



•^rh 



2(2 + a) 
Z_(to) - (l + 2a)- 



E 



Z+(m) - 



2(2 + a) 



2(2 + a) 
^ E[/C(m)] + pen(m) + Ri{M) + R2{M) 



(4.13) 



where Ri{M.) and R^iM.) are the sums defined in the theorem 4. As the choice 
of 771 is arbitrary, we can take the infimum among 771 G A^ in the right part 
of (4.13). 



4.3. Proof of the proposition 3 

For the collection T^'^ , we have A = 1 and i? = in (2.4). Let 777 G M, we 
denote by dm G S„i the quantity 

CTm = ^ O-^jl/ with \fl epm, ^7n,I = TT| X! ^' ' 

The theorem 2 gives us 

E[ICn{Ps^a,P~s,a)] 

^- inf {/C(P,.,,P,„.,,J+Anl0g'+'L'm} + - 
77 mGA-1 77 

<- inf {/C(P,.,,P,„.^,J + Anlog'+'i?m} + - 

77 mSA-l 77 



^C inf 



^GAI 1 2r7tT* 27lCT^ 

because, for any a; > 0, <p{x) < (x — l/x)^- 



m\\2 I 7-, 1 l+£ 7-1 
f Dm log ^ -Dr, 



R 

77 
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o- - cr„i||2 ^ nL2\pj, 



Thus, we obtain 



Lin. 



If ai < a2i we can take 



and 



IPni l^^ri 



\Pv 



)--^^§,\vmr--^ 


lo 


n 


/ Lin y/(^+^" 


l) 




\2a,\oi+'n) 




Lin y/(i+2-) 




2cr2 1ogi+^nj 





Dr. 



R 

n 



For ai 5^ a2, this choice is not allowed because it would imply dm — 0. So, in 
this case, we take 

, T ,1 I f {Ll<j.+Ll)n \'^^'+"^^'^ 

dm = 1 and \p,n\ = 

[V 2a^\og'^'n J 

In the two situation, wc obtain the announced result. 



5. Technical results 



This section is devoted to some useful technical results. Some notations previ- 
ously introduced can have a different meaning here. 

Lemma 10. Let T, be a positive symmetric n x n-matrix and cti, . . . , cr„ > 
he its eigenvalues. Let P he an orthogonal projection of rank D ^ 1. If we 
denote M = PY,P , then M is a non-negative symmetric matrix of rank D and, 
if Ti, . . . ,T£i are its positive eigenvalues, we have 



min ffi ^ min r^ 

l<i<n l<i<D 



and 



max Ti ^ max ai 

l<i<D l<i<n 



Proof. We denote by E^/^ the symmetric square root of S. By a classical result, 
M has the same rank, equal to D, than PE^/^. On a first side, we have 
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max Ti 

l<i<D 



sup 



sup 



(PEX2, X2) 



€ 



(2:i,a;2)eker(P)xim(P) ll^^lIP + l|a;2|P 
(a:i,a;2)#(0,0) 

(EX2, X2) 

^^^ ~n — n2~ 

a;2eim(P) IF2| 
2:2^0 



^ max (Ji 

l<i<n 



On the other side, we can write 



mm Ti 

l<i<D 



mm max 

VCR" x£V 

dim(V)=n-£>+l x^a 



{Mx, x) 
m/^PxP 



mm max ■ 

VCK" x^V 

dim(V)=n-D+l x^O 



> 



> 






max 



mm ..™. 

vcK" xGVnim(P) ||a;|p 

dim(y)=Ti-P)+l x^O 



|I]1/22;H2 



mm max ■ 



y'cK" xGF' ||a;|p 

dim(V)^l a:#0 



l<i^n 



D 



Lemma 11. Let e be a standard Gaussian vector in M", a = (oi, . . . , a„)' S W^ 
and 61, ... , 5„ > 0. We denote by b* (resp. b^) the maximum (resp. minimum) 
of the bi 's. If n > 2 and Z = X]"=i('^i + Vbi^i)'^ , then 



E 



€ 



2K{b*/b,) 



E[Z] V ' n-2 

where k > 1 is a constant that can be taken equal to 1 + 2e^^ w 1.736. 

Proof. We recall that E[Z] — X]r=o('^? "*" ^«) ^^^'^' '^'-"^ ^^^^ A > 0, the Laplace 
transform of (a^ + ^/bleiY is 



exp f-A(aj + \/hi£,)' 



cxp 



\„2 1 

— -^ log(l + 2A6/ 

1 + 2A6,- 2 ^^ 



Thus, the Laplace transform of Z is equal to 



exp 



,-AE[Z] 



" \ 2 1 " \ 

^1 + 2A6, 2^ ^^ *M 

/^2A2a|6, 1^ . ^_' 
\z=0 * i=l y 
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where r{x) — log(l + x) — x for all a; > 0. To compute the expectation of the 
inverse of Z, we integrate "0 by parts, 



E 



1 



0(A)dA 



e"^«[^l X cxp 



1 



where 



E[Z] E[Z] Jo 

" 2A6? 



/a,6(A)V(A)dA 



fa.bW = E T 



AXa^hil + Xb^) 



i=0 



+ 2Xb, il + 2Xb,y 



We now upper bound the integral 

'E[Z] 



E 



- 1 



'"-'^ ' nr=i VI + 2Afe, 



< 



2riA&* 



(l + 2A6.)i+"/2 
46*(1 + A6*) 



dX 



(l + 2Afe,)i+»/2 



X g,,b(A)e-s-^(^)dA 



where we have set 



9. 



.,.(a) = Et 



Aa? 



2A6,- 



For any t > 0, te * ^ e ^. Because gaj, is a positive function and n > 2, we 
obtain 



E 



E[Z] 
Z 



- 1 



< 



=<; 



2nA6*' 



-dX 



Ab*{l + Xb*) 
(1 + 2A&0'+"/'"" ' Jo e(l + 2A6,)i+"/ 
2{b*/b.,f A{b*/b^){n-2 + b*/b^) 



,dX 



n-2 



^ mhin 



en{n — 2) 
2{n-iy 



n — 2 \ en 

2 



< 2(l + 2e-)(^*/^*^ 



n-2 



□ 
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